Команда:
Презентация скомпилирована из Jupyter Notebook при помощи пакета RISE.
Разведочный анализ данных, МОВС, НИУ ВШЭ
28.12.2021
Данные: авиабилеты небольших городов России с аэропортами.
Список аэропортов: получен из ресурса unipage.net путем парсинга HTML страниц. Целевые аэропроты - размера S и M по классификации unipage.
Авиабилеты: для всех полученных аэропортов была собрана информация по авиабилетам за период с 2021 до 2022 года. Источник данных - Avisales API.
%env AVIASALES_TOKEN="YOUR TOKEN"
env: AVIASALES_TOKEN="YOUR TOKEN"
from parser.data_processor import get_tickets
df = get_tickets(num_airports=1000,
airport_size=["M", "S"])
Parsing available airports...
100%|█████████████████████████████████████████| 101/101 [00:44<00:00, 2.26it/s] 0%| | 0/3655 [00:00<?, ?it/s]
Parsing tickets...
100%|███████████████████████████████████████| 3655/3655 [20:05<00:00, 3.03it/s]
Total count for: 13316
df.head(3)
| origin | destination | price | airline | flight_number | departure_at | return_at | transfers | expires_at | datetime | origin_city | origin_country | destination_city | destination_country | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | MRV | VOG | 94 | UT | 322 | 2021-12-05T16:35:00+03:00 | 2021-12-06T14:55:00+03:00 | 1 | 2021-12-06T18:21:59Z | 2021-12-05 | Mineralnyye Vody | Russia | Volgograd | Russia |
| 1 | MRV | VOG | 77 | A4 | 357 | 2021-12-06T13:45:00+03:00 | 2021-12-13T14:55:00+03:00 | 1 | 2021-12-06T18:21:59Z | 2021-12-06 | Mineralnyye Vody | Russia | Volgograd | Russia |
| 2 | MRV | VOG | 82 | DP | 6916 | 2021-12-07T20:40:00+03:00 | 2021-12-10T14:55:00+03:00 | 1 | 2021-12-06T18:21:59Z | 2021-12-07 | Mineralnyye Vody | Russia | Volgograd | Russia |
Рассчитываем координаты полученных точек и расстояние между точками маршрута.
import numpy as np
import pandas as pd
from geopy.geocoders import Nominatim
from geopy import Point, distance
def create_point(row, key):
if str(row[f'{key}_city_lat']) != 'nan':
return Point(str(row[f'{key}_city_lat']) + ' ' + str(row[f'{key}_city_lon']))
else:
return row
def find_distance(x):
try:
return distance.distance(x['destination_point'], x['origin_point']).km
except Exception:
return np.nan
def add_location_metadata(df: pd.DataFrame):
"""
Function extracts additional data on cities location and distance to dataframe.
"""
geolocator = Nominatim(user_agent='andrew_v')
all_cities = list(set(df['origin_city'].unique()) | set(df['destination_city'].unique()))
cities_lat_dict, cities_lon_dict, cities_type_dict, cities_country_dict = {}, {}, {}, {}
not_found = []
for city in all_cities:
process_city(cities_lat_dict, cities_lon_dict, cities_type_dict, cities_country_dict, not_found)
df['origin_city_lat'] = df['origin_city'].map(cities_lat_dict)
df['origin_city_lon'] = df['origin_city'].map(cities_lon_dict)
df['origin_city_type'] = df['origin_city'].map(cities_type_dict)
df['origin_city_country'] = df['origin_city'].map(cities_country_dict)
df['destination_city_lat'] = df['destination_city'].map(cities_lat_dict)
df['destination_city_lon'] = df['destination_city'].map(cities_lon_dict)
df['destination_city_type'] = df['destination_city'].map(cities_type_dict)
df['destination_city_country'] = df['destination_city'].map(cities_country_dict)
df['destination_point'] = df.apply(lambda x: create_point(x, 'destination'),
axis=1)
df['origin_point'] = df.apply(lambda x: create_point(x, 'origin'),
axis=1)
df['distance_origin_destination_km'] = df.apply(lambda x: find_distance(x), axis=1)
return df
df = add_location_metadata(df)
df.head(3)
| Unnamed: 0 | origin | destination | price | airline | flight_number | departure_at | return_at | transfers | expires_at | ... | origin_city_lon | origin_city_type | origin_city_country | destination_city_lat | destination_city_lon | destination_city_type | destination_city_country | destination_point | origin_point | distance_origin_destination_km | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | MRV | VOG | 94 | UT | 322 | 2021-12-05T16:35:00+03:00 | 2021-12-06T14:55:00+03:00 | 1 | 2021-12-06T18:21:59Z | ... | 43.0909294469511 | village | Россия | 48.7081906 | 44.5153353 | city | Россия | 48 42m 29.4862s N, 44 30m 55.2071s E | 52 46m 1.47s N, 43 5m 27.346s E | 462.558569 |
| 1 | 1 | MRV | VOG | 77 | A4 | 357 | 2021-12-06T13:45:00+03:00 | 2021-12-13T14:55:00+03:00 | 1 | 2021-12-06T18:21:59Z | ... | 43.0909294469511 | village | Россия | 48.7081906 | 44.5153353 | city | Россия | 48 42m 29.4862s N, 44 30m 55.2071s E | 52 46m 1.47s N, 43 5m 27.346s E | 462.558569 |
| 2 | 2 | MRV | VOG | 82 | DP | 6916 | 2021-12-07T20:40:00+03:00 | 2021-12-10T14:55:00+03:00 | 1 | 2021-12-06T18:21:59Z | ... | 43.0909294469511 | village | Россия | 48.7081906 | 44.5153353 | city | Россия | 48 42m 29.4862s N, 44 30m 55.2071s E | 52 46m 1.47s N, 43 5m 27.346s E | 462.558569 |
3 rows × 26 columns
# можно уменьшить карту и увидеть все точки отправления
show_circles_on_map(df, "origin_city_lat", "origin_city_lon", "blue")